Automatic Acquisition of Phrase Grammars for Stochastic Language Modeling

نویسندگان

  • Giuseppe Riccardi
  • Srinivas Bangalore
چکیده

Phrase based language models have been recognized to have an advantage over word based language models since they allow us to capture long span ning dependencies Class based language models have been used to improve model generalization and overcome problems with data sparseness In this pa per we present a novel approach for combining the phrase acquisition with class construction process to automatically acquire phrase grammar fragments from a given corpus The phrase grammar learning is decomposed into two sub problems namely the phrase acquisition and feature selection The phrase acquisition is based on entropy minimization and the feature selection is driven by the entropy reduction principle We further demonstrate that the phrase grammar based n gram language model signi cantly outperforms a phrase based n gram language model in an end to end evaluation of a spoken language application

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Treebank-Based Probabilistic Phrase Structure Parsing

The area of probabilistic phrase structure parsing has been a central and active field in computational linguistics. Stochastic methods in natural language processing, in general, have become very popular as more and more resources become available. One of the main advantages of probabilistic parsing is in disambiguation: it is useful for a parsing system to return a ranked list of potential sy...

متن کامل

Language model acquisition from a text corpus for speech understanding

Speech understanding can be viewed as a problem of translating input natural language of speech recognition results into output semantic language. This paper describes automatic acquisition of a language model for translating natural language into semantic language from a text corpus using a stochastic method. The method estimates co-occurrence probabilities of input and output grammar rules as...

متن کامل

Phrase Structure in a Computational Model of Child Language Acquisition

The problem of the acquisition of morpho-syntactic rules, as addressed by a number of existing computational models, is introduced. A distinction is made between ‘innatist’ models which presuppose the importance of innate linguistic knowledge (specifically, syntactic categories and X-Bar Theory), and ‘empiricist’ models, which reject such assumptions. It is argued that ‘empiricist’ models bette...

متن کامل

Finite-State Approximations of Grammars

Grammars for spoken language systems are subject to the conflicting requirements of language modeling for recognition and of language analysis for sentence interpretation. Current recognition algorithms can most directly use finite-state acceptor (FSA) language models. However, these models are inadequate for language interpretation, since they cannot express the relevant syntactic and semantic...

متن کامل

Computation of the Probability of the Best Derivation of an Initial Substring from a Stochastic Context-Free Grammar

Recently, Stochastic Context-Free Grammars have been considered important for use in Language Modeling for Automatic Speech Recognition tasks [6, 10]. In [6], Jelinek and Lafferty presented and solved the problem of computation of the probability of initial substring generation by using Stochastic Context-Free Grammars. This paper seeks to apply a Viterbi scheme to achieve the computation of th...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998